Building the Slovene Wordnet: First Steps, First Problems

نویسندگان

  • Tomaž Erjavec
  • Darja Fišer
چکیده

We report on the prototype Slovene wordnet which currently contains about 5,000 top-level concepts. The resource is based on the Serbian wordnet which has been automatically translated with the help of a bilingual dictionary, the literals ranked according to the frequency of corpus occurrence, and results manually corrected. The paper also discusses some problems encountered along the way and points out some possibilities of automated acquisition and refinement of synsets in the future.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Multilingual Resources for Building SloWNet Faster

This project report presents the results of an approach in which synsets for Slovene wordnet were induced automatically from parallel corpora and already existing wordnets. First, multilingual lexicons were obtained from word-aligned corpora and compared to the wordnets in various languages in order to disambiguate lexicon entries. Then appropriate synset ids were attached to Slovene entries fr...

متن کامل

A Multilingual Approach to Building Slovene Wordnet

The paper presents an experiment in which synsets for Slovene wordnet were induced automatically from several multilingual resources. Our research is based on the assumption that translations are a plausible source of semantically relevant information. More specifically, we argue that the translational relation on the one hand reduces ambiguity of a source word and on the other conveys semantic...

متن کامل

Enriching Slovene WordNet with domain-specific terms

The paper describes an innovative approach to expanding the domain coverage of wordnet by exploiting multiple resources. In the experiment described here we are using a large monolingual Slovene corpus of texts from the domain of informatics to harvest terminology from, and a parallel English-Slovene corpus and an online dictionary as bilingual resources to facilitate the mapping of terms to th...

متن کامل

Learning to Mine Definitions from Slovene Structured and Unstructured Knowledge-Rich Resources

The paper presents an innovative approach to extract Slovene definition candidates from domain-specific corpora using morphosyntactic patterns, automatic terminology recognition and semantic tagging with wordnet senses. First, a classification model was trained on examples from Slovene Wikipedia which was then used to find well-formed definitions among the extracted candidates. The results of t...

متن کامل

Leveraging Parallel Corpora and Existing Wordnets for Automatic Construction of the Slovene Wordnet

The paper reports on a series of experiments conducted in order to test the feasibility of automatically generating synsets for Slovene wordnet. The resources used were the multilingual parallel corpus of George Orwell’s Nineteen Eighty-Four and wordnets for several languages. First, the corpus was word-aligned to obtain multilingual lexicons and then these lexicons were compared to the wordnet...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005